A road data assets revenue allocation model based on a modified Shapley value approach considering contribution evaluation

This paper constructs a two-layer road data asset revenue allocation model based on a modified Shapley value approach. The first layer allocates revenue to three roles in the data value realization process: the original data collectors, the data processors, and the data product producers. It fully considers and appropriately adjusts the revenue allocation to each role based on data risk factors. The second layer determines the correction factors for different roles to distribute revenue among the participants within those roles. Finally, the revenue values of the participants within each role are synthesized to obtain a consolidated revenue distribution for each participant. Compared to the traditional Shapley value method, this model establishes a revenue allocation evaluation index system, uses entropy weighting and rough set theory to determine the weights, and adopts a fuzzy comprehensive evaluation and numerical analysis to assess the degree of contribution of participants. It fully accounts for differences in both the qualitative and quantitative contributions of participants, enabling a fairer and more reasonable distribution of revenues. This study provides new perspectives and methodologies for the benefit distribution mechanism in road data assets, which aid in promoting the market-based use of road data assets, and it serves as an important reference for the application of data assetization in the road transportation industry.


Road digitalization
As a technological advancement, digitalization has permeated various aspects of the economy and society.The academics describe the essence of digitalization from the perspectives of product and value creation, stating that it facilitates the transition from one-way to two-way product design, enabling interactive and configurable products, and promoting the co-creation of the product value.
Among them, "road digitalization" is based on collecting data through various types of sensing equipment, relying on the multi-network convergence of communication facilities to transmit data, and through intelligent analysis and processing, to achieve highway control guidance, intelligent decision-making, personalized services, etc.This effectively improves the safety level of the transportation system, traffic efficiency, and management effectiveness.
Some scholars have already conducted research in the field of road digitalization.Singh et al. 3 extensively examined the significance of road digitalization from various aspects such as intelligent lighting systems, smart emergency management systems, and renewable energy.They also described the architectures of intelligent lighting systems and smart emergency management systems.Lu et al. 4 constructed a real-time digital model of traffic scenes based on vision, which supports the development of digital twins of road traffic to a certain extent.With the support of these digitalization technologies for roads, the application value of road data assets can be fully explored.
Road data assets refer to various digital resources related to roads, including dynamic data such as technical indicators, traffic flow, weather conditions, and vehicle routes.These data can be used for traffic monitoring and prediction 5 , signal control, and road condition feedback 6 .For example, the Beijing Municipal Commission of Transport has opened up traffic-related data to travel service platforms such as Amap, Baidu Maps, and Meituan, enabling these platforms to provide new features such as bus occupancy rate query, comprehensive comparison of travel plans, and estimated travel time, which comprehensively improves the level of traffic and travel services.
The digitalization of roads provides a data foundation for road data assets through various sensing devices that collect dynamic road data.The application of road data assets elevates roads from static construction to "networked, sensed, and intelligent" dynamic management, which is the key foundation for smart transportation development.Data asset trading provides an opportunity for the open sharing of road data, creating revenue for relevant transportation enterprises and further enhancing the value and influence of road digitization.This process advances the scientific, intelligent, and efficient development of road management.

Data assets trading
The transition of data from being perceived as mere objects to being regarded as valuable assets signifies its significant contribution to economic development 7 .There are two main ways to realize the economic value of data assets: one is to bring economic benefits indirectly by optimizing business processes and assisting decisionmaking within the enterprise; the other is to sell the data assets directly to the outside world in the data trading market so that more enterprises can benefit from them and fully activate the value of these data assets.
Data transactions are typically facilitated by three parties: data consumers, data providers, and data markets.Data providers package and submit their data to the data market, which then matches the appropriate data providers with the needs of data consumers.Finally, data providers and consumers interact to finalize the transaction 8 .Acting as intermediaries, data markets primarily provide services such as data legality examination, quality assessment, and value evaluation.
Europe and the United States have explored data trading earlier, and currently, active big data trading platforms include Dawex (France), Streamr (Sweden), Advaneo (UK), Otonomo (Israel), and so on.In 2015, China began implementing its big data strategy and established the first domestic big data exchange institution, the Guizhou Big Data Exchange.In 2019, China further proposed participating in the distribution of data as a factor of production, and data trading organizations were established one after another in Beijing, Shanghai, and Shenzhen, marking that data trading has entered a period of rapid development in China.
Data products sold by data trading platforms mainly include different forms such as data packages, API interfaces, and data analysis reports.Differences in the form of data products affect the formulation of pricing strategies.Existing data pricing strategies can be classified into six categories: free data, usage-based pricing, package pricing, uniform pricing, freemium pricing, and two-part pricing combining bundle and uniform pricing 9 .Data trading cannot be realized unless the precise selling price of the data set is established.Data pricing needs to meet the requirements of revenue maximization, fairness, arbitrage-free pricing, computational efficiency, etc. 9 , and more scholars have explored and researched data pricing methods.Liang et al. 10 explored the factors affecting the data price based on the feature price model in terms of the data object, the data seller, and the data buyer.Tian et al. 11 focused on the data seller as the main entity and designed optimal contract mechanisms considering privacy protection in various market scenarios, aiming to achieve individual rationality and incentive compatibility.Oh et al. 12 designed a competitive Internet of Things (IoT) data trading environment consisting of data providers, data brokers, data service providers, and data consumers.They also proposed a unified method for pricing data sets to compare the competitiveness of different data brokers.
In addition to considering the transaction scenario and market supply and demand conditions, data pricing also entails focusing on the inherent value and potential contributions of the data.Some scholars have conducted research on the valuation of data assets from the perspective of data intrinsic characteristics.Yu et al. 13 proposed a data pricing model that takes into account data quality and versioning strategies, enabling data quality assessment and market segmentation.Liao et al. 14 quantified user privacy choices and constructed a multi-scenario data property bilateral trading model.Chellappa et al. 15 conducted a detailed analysis of version control strategies for data products and derived the optimal version of data products along with corresponding prices.
www.nature.com/scientificreports/In the process of data transactions, various technical means need to be applied to protect data security and the rights of data rights holders.Currently, technologies such as privacy computing 16 , blockchain 17 , and digital watermarking 18,19 can support the platform's data protection efforts, set up the platform's data protection system, and consider data security and compliance when conducting transactions.
In summary, as a new type of strategic resource, data assets have data trading as one of the important means to realize their commercial value.Promoting the transaction and utilization of data assets is an important development trend nowadays, which can create value for data participants.Data trading platforms should activate the value of data while maintaining data security and complying with regulatory requirements to promote the orderly circulation of data resources.Revenue allocation is the primary task following data asset transactions, serving as a key motivator to stimulate the active participation of enterprises.A fair and equitable revenue allocation mechanism can promote and incentivize deep open sharing and the value creation of data assets.

Revenue allocation methods
For the revenue allocation of data assets, there is a lack of mature allocation methods, and the Shapley value method based on cooperative game theory is widely used in revenue allocation problems in various fields.The Shapley value method can achieve a unique and fair distribution of asset benefits by calculating the marginal contributions of each participant in different combination scenarios.Luo et al. 20 proposed a rapid calculation method of accurate Shapley value under the independent utility for multi-source datasets, but this method only considers the data owner's benefit allocation and does not cover other participants in the data value chain.
The basic Shapley value method treats all participants as equal in status and distributes benefits based solely on the average marginal contribution, without considering the differentiated contributions of the participants.To overcome this shortcoming and make the revenue allocation reasonable, many scholars have tried to introduce factors such as input cost, risk-taking, and urgent demand based on the Shapley value method to reflect the asymmetric contributions of the participants.Wang et al. 21established a modified Shapley value method based on cloud gravity, taking into account risk, inputs, and service quality, and applied it to the revenue allocation of a private charging pile-sharing project, which significantly improves the effect of multi-party cooperation.Yang et al. 22 constructed a modified Shapley value-based integrated energy system revenue-sharing model based on operational risk factors, which can reflect the actual operational risk and the degree of contribution of participants.Zheng et al. 23 introduced five non-cooperative and cooperative models for a remanufacturing closed-loop supply chain.They considered the bargaining power of alliances as the game's bottom line and proposed a method of variable-weighted Shapley value to achieve profit distribution in the supply chain.
The roles and tasks performed by different parties in a cooperative alliance differ, and the Shapley value based on contribution alone cannot fully account for other key factors, such as resource input and risk-taking by the participants, making the benefit distribution scheme unfair to some extent.Therefore, to achieve reasonable benefit distribution and stable cooperation in the data asset value chain, the basic Shapley value method needs to be improved by selecting appropriate modifying factors for the specific conditions of the value chain to take into account the contributions, inputs, risks, and other factors of the participants in a fair manner.
In conclusion, for the problem of data asset revenue allocation, the method based on the Shapley value method has the advantage of being uniquely fair, but it also has the defect of considering only the average contribution and ignoring the differentiated contribution.To achieve fairness and efficiency in revenue allocation, it is necessary to follow the principle of fair distribution of the Shapley value method, fully consider the differentiated characteristics of each participant in the value chain, and use appropriate modifying factors to design an improved scheme that can take into account both the fairness of revenue allocation and the stability of alliance cooperation.The study of revenue distribution of road data assets by modifying the Shapley value method can achieve fair and reasonable revenue sharing among the participants, stimulate data sharing, cooperation, and innovation, and further optimize road construction and management decisions.

Ethical declaration
The research described in this paper focuses on developing a revenue allocation model for road data assets using a modified Shapley value approach.The data referenced is simulated and does not contain any real or private information about individuals or organizations.All data of the revenue distribution model discussed are entirely fictitious and fabricated for the sole purpose of demonstrating the proposed approach.No actual road assets or transportation systems data has been accessed or analyzed without appropriate consent.This research does not involve the collection of any confidential data or infringement on privacy rights.The study does not aim to cause harm or unfairly benefit any entities.As this is theoretical research for academic purposes only, it does not have any current real-world implications.The research methodology and proposed model strive to maintain ethical standards, avoid conflicts of interest, and uphold principles of fairness and integrity.

Model building Players
This paper divides the process of realizing the value of highway data assets into three key stages: original data collection, data processing, and data product development.Based on this process, the main stakeholders in distributing revenues from highway data assets can be divided into three roles: First, the original data collectors, namely the initial holders of road data, who complete the original collection of road data and own these data; second, the data processors, who add value to the original data through cleansing, integration, analysis, mining, and other means; third, the data product producers, who utilize the processed data for product design, development, and operation, realizing the full commercial value of the data assets.It should be emphasized that since road data rights can be separated and shared, different interests can be allocated to different stakeholders as needed, and the same participant may also simultaneously take on roles in multiple stages of the data value chain.For example, some transportation operators are responsible for both original data collection and participation in data processing and product design.Therefore, the distribution of revenues from road data assets should be reasonably determined based on the contributions made by each participant at different stages.

Evaluation indicator system for revenue allocation
The traditional Shapley value method only allocates revenues based on marginal contributions, while participants in the same role may have significant differences in costs, risks, and other aspects.These differences need to be fully considered in the revenue distribution process.In addition, generating road data assets requires the participation of original data collectors, data processors, and data product producers.In practice, some participants may simultaneously take on multiple roles.The revenue distribution needs to comprehensively consider their contributions across different roles.
To address these issues, this paper proposes a two-layer allocation mechanism based on the traditional Shapley value method to reasonably distribute revenues from road data assets.The first layer determines the revenue shares for the three roles based on their contributions in the value chain; the second layer further distributes the revenues of each role to the actual participants.Compared to the Shapley value method, which only considers marginal contributions, this two-layer allocation mechanism is more comprehensive and reasonable, as it additionally takes into full consideration the differences in costs and risk sharing among different participants, as well as the contributions of the same participant under different roles.By considering both role contributions and participant efforts, the two-layer allocation mechanism achieves fair and effective revenue distribution.

First layer: role revenue allocation
During the lifecycle of road data assets, all participants face various risks, and those taking on more risks expect higher returns.Therefore, this paper takes the data risk factor as a correction factor for role benefit allocation.According to the sources, data risks are divided into external risks and internal risks.External risks mainly include policy risks and legal risks, as changes in relevant policies and the enactment of laws regarding data assets can significantly impact participants' operations.Internal risks refer to those arising from equipment failures, data security, and other factors during the generation of road data assets, which can be prevented and controlled.

Second layer: revenue allocation among participants of the same role
Based on the characteristics of the three roles-original data collectors, data processors, and data product producers-specific indicators that influence revenue distribution among participants within each role are constructed respectively.
For original data collectors, their contribution lies in planning the collection of high-value original data.This paper employs three indicators-construction cost, data demand, and data characteristics-to adjust the revenue of the original data collection participants.Construction cost covers the major costs involved in the production process of original data, including sub-indicators of data planning, data collection, and data storage.Data demand is assessed by examining the scarcity and application value of the data in the market and is further divided into two sub-indicators: demand extent and scarcity level.Considering the large volume but low-value www.nature.com/scientificreports/density of original road data, data coverage and timeliness of updates are included as sub-indicators under data characteristics.
For the data processors, their contribution lies in transforming the original data into high-quality data with application value.Two indicators-data cleansing and data analysis-can be adopted to evaluate the contribution of data processing participants.Data cleansing is considered the fundamental task in data processing, and its effectiveness can be assessed using the sub-indicators of data volume and data quality.As the core of extracting data value, data analysis can be evaluated based on the quality of analysis methods and analysis utility as sub-indicators.
For the data product producers, their contribution lies in developing products and services for end users based on processed data, as well as managing the operation of the products.Two indicators-product development and product maintenance-can be employed to assess the contribution of data product producers.Product development is evaluated based on the workload and difficulty coefficient as sub-indicators, while product maintenance is evaluated based on stability and update frequency as sub-indicators.
Overall, the evaluation indicator system for road data asset revenue allocation constructed in this paper comprehensively considers the contributions of different roles and participants in the road data asset value chain.The specific indicators are illustrated in Fig. 1, and Table 2 provides detailed explanations of the definitions, calculation methods, and value ranges for each indicator.

Conventional Shapley value
The Shapley value method is a cooperative game approach used to solve the problem of profit distribution in multi-party cooperation.It determines the allocation of profits for each participant based on their marginal contributions.It is known for its characteristics of simple model construction, easy solvability, and unique solutions, allowing for a balance between efficiency and fairness in the distribution process.

First layer: role revenue allocation
Suppose in a road data asset revenue distribution, the three roles of original data collectors, data processors, and data product producers are represented by the set R = {1, 2, 3} .For any subset (representing any combination of roles in the role set), there exists a real-valued function v(s) , satisfying: [R, v] is termed the cooperation strategy of the three roles, and v represents the characteristic function of the cooperation strategy.
x i denotes the fraction of the maximum revenue v(R) from the road data asset that role i receives.Based on the cooperative strategies [R, v] , the income distribution among the three roles is represented by x = (x 1 , x 2 , x 3 ) .A successful cooperative strategy must satisfy the following conditions: where s i is a set containing all subsets of R that include role i , |s| is the number of elements in subset s , w(|s|) is the weighting factor, v(s) is the revenue for subset s , and v(s\i) represents the revenue that can be obtained by removing role i from subset s.
Therefore, the Shapley value method is applied to evaluate the contributions of the three roles in the road data asset, and the calculations for revenue allocation are presented in Table 3.

Second layer: revenue allocation among participants of the same role
Once the Shapley values �(v) = (ϕ 1 (v), ϕ 2 (v), ϕ 3 (v)) for revenue allocation among the three roles in a road data asset are determined, it is necessary to determine the specific distribution of benefits to the participants under the same role based on the ϕ i (v) values of each role, to realize the distribution of benefits from the road data asset to each participant.
Assuming that there are n participants in a road data asset revenue allocation, the number of participants with the roles of original data collectors, data processors, and data product producers is n i (i = 1, 2, 3 ), and it is clear that as the profit obtained by j th participant when distributing the profit ϕ i (v) of role i: where s i j represents the set of all subsets of participants within role i that includes participant j , |s| is the number of elements in subset s , w(|s|) is the weighting factor, v(s) is the profit for subset s , and v(s\j) denotes the profit that can be obtained by excluding participant j from subset s. (2)

Stability
The running status of the product needs to be monitored, and any faults or errors that occur need to be addressed and fixed promptly The value is the average number of stable days per month for the product

Update frequency
As market demands are updated and change, there is a need for continuous improvements to the product's functions and features to provide greater value and user experience The value is the average interval time of product innovation Vol.:(0123456789) www.nature.com/scientificreports/

Synthesis of revenue allocation among participants
After calculating the revenue distribution for each participant within each role, it is necessary to synthesize the revenue distribution among participants under different roles, taking into account their contributions at different stages.
Let N = {1, 2, . . ., n} be the set of participants, and N i = {1, 2, . . ., n i }(i = 1, 2, 3 ) represents the set of partici- pants for the roles of original data collectors, data processors, and data product producers, respectively.Clearly N i ⊂ N , due to the different sizes and order of elements in sets N and N i , we define a function f i : N → N i that, for each element x(x = 1, 2, . . .n ) in set N , maps it to the corresponding element in set N i , if there exists an element j ∈ N i such that f i (x) = j , otherwise there is no corresponding element in set N i .Therefore, the profit distribution for each participant x in different roles i can be represented as φi x (v) , where: To synthesize the profit values for each participant in different roles, we obtain the total profit distribution ϕ x (v) for the participant in the road data asset, denoted as:

The modified Shapley value
The traditional Shapley value method only determines revenue allocation based on marginal contributions, without considering differences among participants in terms of costs and risks.To achieve fair profit distribution of road data assets, it is essential to comprehensively evaluate the differences among roles and participants in terms of input costs, risk allocation, and other aspects.In this paper, based on the traditional Shapley value method, a revenue allocation evaluation indicator system for the road data asset, as depicted in Fig. 1, is established.This indicator-driven two-layer allocation correction scheme is used to modify the revenue allocation among different roles and participants.By doing so, a more equitable and reasonable revenue allocation model for the road data asset is developed.The architecture of the improved revenue allocation model for the road data asset is illustrated in Fig. 2.
Table 3. Road data asset role revenue allocation.
(a) Revenue allocation for the original data collectors (b) Revenue allocation for the data processors (c) Revenue allocation for the data product producers www.nature.com/scientificreports/Calculation of weights for evaluating revenue allocation of road data assets Calculation of primary indicator weight.This study utilizes the entropy weighting method to calculate the www.nature.com/scientificreports/weights of primary evaluation indicators in Fig. 1.It is assumed that m expert will be invited to evaluate the importance of I primary indicators and obtain a scoring matrix S = (s ij ) m×I , i = 1, 2, . . ., m , j = 1, 2, . . ., I , where s ij represents the rating provided by the i expert for the j indicator.
If j denotes a profit-related indicator, normalization is performed according to Eq. ( 7): If j denotes a cost-related indicator, normalization is performed according to Eq. ( 8): The weights p ij of the scores given by different experts to each indicator are calculated using the entropy weighting method, as shown in Eq. ( 9): The information entropy value e j is calculated separately for each indicator j according to p ij : To ensure that the entropy value e j holds numerical significance, we set ln p ij = 0 when p ij = 0.The entropy weight ω j for each indicator is then calculated based on the entropy value e j , as follows: Calculation of secondary indicator weight.For the secondary evaluation indicators in Fig. 1, the rough set theory is employed in this study to calculate their indicator weights.It is assumed that m experts are invited to assess the importance of I j secondary indicators under the j( j = 1, 2, . . ., I ) primary indicator, leading to the construc- tion of an evaluation information system S j = (U, A j , V j , f ) , where: the universe of discourse U = {1, 2, . . ., m} , a non-empty finite attribute set A j = {a 1 , a 2 , . . ., a I j } , and the attribute value domain V j are obtained through www.nature.com/scientificreports/expert assessment using a percentage-based scoring system.Moreover, f represents the relationship set between U and A j , also referred to as the information function set.
Definition 1 Let R be an equivalence relation on U , denoted as: U/ind(R) is referred to as the partition of U , and each element a is called an equivalence class.In an information system S j , different attributes have varying effects, and some attributes may even be redun- dant.Therefore, it is necessary to eliminate irrelevant or unimportant knowledge from the information system while maintaining its classification ability.This process is known as knowledge reduction.Knowledge reduction is divided into attribute reduction and attribute value reduction.However, since attribute value reduction is relatively straightforward, knowledge reduction generally refers to attribute reduction in most cases.
Definition 2 If ind(R) = ind(R − {r}) , it is referred to r as reducible knowledge in the information system R .If P = R − {r} is independent, then P is a knowledge reduction in R.
In the information system S j , the set of secondary indicators for the primary indicator j is denoted as A j = {a 1 , a 2 , . . ., a I j } .Assume that there are l j sets of A j divisions over U , represented as U/ind(A) = {X 1 , X 2 , . . ., X l j } .The information quantity of A j is calculated as: where |U| represents the number of elements in the universe of discourse U , and |X i | denotes the number of elements in the i th set.
In the information system S j , for the knowledge reduction ind(A j − {a}) of ∀a ∈ A j , let there exist l a sets of the partition of U after reduction, denoted as U/ind(A j − {a}) = {X 1 , X 2 , . . ., X l a } .The information quantity of A j − {a} is given by: Therefore, the importance of a in A j can be expressed as: The weights of secondary indicators A j = {a 1 , a 2 , . . ., a I j } under the primary indicator j can be calculated based on their importance using the equation: By incorporating the entropy weight ω j of the primary indicator j , the final weights of the secondary indica- tors A j = {a 1 , a 2 , . . ., a I j } can be determined as:

Evaluation of revenue allocation indicators for road data assets
Once the weights of the revenue allocation evaluation indicators for road data assets are determined, it is necessary to numerically evaluate different schemes under the relevant indicators.As some indicators involve subjective measures and others are objective numerical metrics, different methods are required to quantify both subjective and objective factors for an effective assessment of revenue allocation indicators for road data assets.
Defining a scheme as a collective term for subjects involved in revenue allocation across different layers, the scheme represents roles at the first layer and participants within each role at the second layer.Assuming that there are D schemes involved in the distribution of a road data asset, scheme d(d = 1, 2, . . ., D ), requires a com- prehensive evaluation of all secondary indicators under I primary indicators.Let there be I j secondary indicators under the j th(j = 1, 2, . . ., I ) primary indicator, and the set of indicators is A j = {a 1 , a 2 , . . ., a I j } , of which there are İj subjective indicators and Ïj objective indicators, and İj + Ïj = I j , let the set of subjective secondary indica- tors under the j th primary indicator be A . ., İj , and the set of objective secondary indicators be Among them, r 1 , and r 5 a ′ i (d) respectively represent the frequency distribution of indicator a ′ i (i = 1, 2, . . ., İj ) under the five comments of low, moderately low, moderate, moderately high, and high.
Based on the indicator weights calculated according to Eq. ( 17), the subjective indicator weight vector for Indicator A ′ j is denoted as . Using this weight vector, the fuzzy evaluation vector is obtained as: where T j (d) is referred to as the fuzzy evaluation vector.
Using the membership degree of the comment set V ={low, moderately low, moderate, moderately high, high}, the membership degree vector V = [0.1,0.3, 0.5, 0.7, 0.9] can be determined.From this, the evaluation value L ′ j (d) of the subjective component for indicator j can be calculated as: The subjective evaluation values for I primary indicators are synthesized as: where L ′ (d) is termed as the subjective evaluation value of scheme d.

Objective evaluation
In the set of objective indicators A ′′ j = ′′ 1 , a ′′ 2 , . . ., a ′′ Ïj } , the numerical value for each indicator of Scheme d is represented by a vector, denoted as . For the indicator a ′′ i (i = 1, 2, . . ., Ïj ), if it is a revenue indicator, it is normalized on the scheme D according to Eq. ( 22), and if it is a cost indicator, the values are normalized using Eq. ( 23).This normalization process yields the normalized value vector, denoted as fd (A ′′ j ) = [ fd (a ′′ 1 ), fd (a ′′ 2 ), . . ., fd (a ′′

Ïj
)] of the objective indicator A ′′ j can be obtained, and based on the vector of normalized values fd (A ′′ j ) , the evaluation value of the objective part of the indicator j is calculated as L ′′ j (d): Synthesize the assessed value of the objective component of the I primary indicators indicator: where L ′′ (d) is called the objective evaluation value of scheme d.

Integration of objective and subjective evaluations
Combine the subjective and objective evaluation values for scheme d to obtain the composite evaluation value.www.nature.com/scientificreports/where L(d) is the composite evaluated value of scheme d and α(0 ≤ α ≤ 1 ) is the weighting factor, allowing for the adjustment of the importance of subjective and objective evaluation values in the composite evaluated value.Normalize the composite evaluated value L(d) of scheme d: The modification of road data asset revenue allocation Role revenue allocation modification.Based on Eq. ( 3), the initial allocations for the roles R = {1, 2, 3} of the original data collectors, data processors, and data product producers can be computed, denoted as , where v(R) repre- sents the maximum revenue for the road data asset.
As illustrated in Fig. 2a, based on the model in Section "Evaluation of revenue allocation indicators for road data assets", the comprehensive evaluation values LR (1) , LR (2) , and LR (3) for the roles of the original data col- lectors, data processors, and data product producers can be calculated.
Next, compute the role revenue allocation modification factor: The modified value of the role's revenue allocation is:

Participant revenue allocation modification within the same role
Suppose there are n participants involved in the distribution of road data asset profits, and the number of par- ticipants in the roles of data collectors, data processors, and data product producers is denoted as n i (i = 1, 2, 3 ).According to Eq. ( 4), we determine the initial distribution scheme pants within the role i based on the role revenue allocation modified value φi (v) , where As shown in Fig. 2b, applying the model in Section "Evaluation of revenue allocation indicators for road data assets", we can calculate the comprehensive evaluation value [ L1 (1), L1 (2), . . ., L1 (n 1 )] for participants within the data collectors.
To modify the revenue allocation for participants within the data collectors, we compute the participant modification factor as follows: The modified values for participant revenue allocation within the data collectors are then given by: Similarly, using the model in Section "Evaluation of revenue allocation indicators for road data assets", we can calculate the comprehensive evaluation value [ L2 (1), L2 (2), . . ., L2 (n 2 )] for participants within the data processors.
For the data processors, the participant revenue allocation modification factor is calculated as follows: The modified values for participant revenue allocation within the data processors are then obtained as: Likewise, considering the model in Section "Evaluation of revenue allocation indicators for road data assets", we can compute the comprehensive evaluation value [ L3 (1), L3 (2), . . ., L3 (n 3 )] for participants within the data product producers.
To modify the revenue allocation for participants within the data product producers, we calculate the participant revenue allocation modification factor as follows: Finally, the modified values for participant revenue allocation within the data product producers are given by: www.nature.com/scientificreports/

Final revenue allocation scheme for participants
As depicted in Fig. 2(c), using Eq. ( 5), we determine the revenue allocation modified values for the n participants across the three roles: By synthesizing the profit values for each participant across the different roles, we obtain the final revenue allocation values for each participant involved in the road data asset:

Case study
Assuming that the sale of a road data asset obtains total proceeds of 960,000 RMB, the revenue need to be allocated to the five enterprises N = {1, 2, 3, 4, 5} involved in data collection, processing and production.According to the process of realizing the value of road data, enterprises can be divided into three types of roles R = {1, 2, 3} : the original data collectors, the data processors and the data product producers, and the set of participating enterprises under the three types of roles are N 1 = {1, 2} , N 2 = {2, 3, 4} , and N 3 = {4, 5} , respectively.Based on our investigation, we found that selling the original data directly can generate revenue of 300,000 RMB while processing the original data and selling it can bring in revenue of 420,000 RMB.Developing the original data into data products and selling them can yield revenue of 660,000 RMB.Without the original data, neither the data processors nor the data product producers can generate any revenue, regardless of whether they operate individually or in cooperation.4. Revenue situation of participating enterprise combinations under each role (unit: ten thousand RMB).

Combination of the enterprises
The original data collectors The data product producers www.nature.com/scientificreports/ The income values and indicator values for the participating enterprises in each role are reasonably assumed, as shown in Tables 4, 5, 6 and 7.

Role revenue allocation
With reference to the revenue data in Table 3, the initial revenue allocation for the original data collectors, the data processors, and the data product producers is calculated using the traditional Shapley value method, as presented in Table 8.
Determine the weights of the evaluation indexes for the role revenue allocation.Ten experts in the field of road data assets were asked to evaluate the importance of two primary indicators, external risk, and internal risk, using a 1-9 scale.The weights of these indicators were then determined using the entropy weight method.The scoring results provided by the experts are presented in Table 9.
According to Eqs. ( 7) and ( 9), the scoring results were normalized and the weights p ij were calculated as shown in Table 10.
The information entropy values and entropy weights of the indicators were calculated according to Eqs. ( 10) and (11), as shown in Table 11.The weights for the primary evaluation indicators of the roles, denoted as ω 1 = 0.444 and ω 2 = 0.556 , were obtained.
The secondary evaluation indicators for the roles were scored on a percentage scale, with higher scores indicating greater importance of the indicators.The scoring results are presented in Table 12.
To facilitate further analysis and capture more common features in the sample data, it is necessary to abstract the indicator scores into higher-level data.Considering the simplification of the model, an unsupervised distancebased method was employed in this study to classify the expert scoring results into three categories, as shown in Table 13.In future research, more scientifically designed and applicable classification methods can be developed based on the characteristics of the scoring data to enhance effectiveness and reliability.The weights of the secondary indicators under external risks and internal risks were calculated according to Eqs. ( 13)-( 16), as shown in Table 14.It is worth noting that the elements within the sets in Table 14 correspond to the indices of the scoring experts in Table 13.
The evaluation of the secondary indicators under external risks and internal risks for each role is subjective.The evaluation process for the indicators of each role is shown in Table 15.
Since the evaluation indicators for the original data collectors, data processors, and data product producers are all subjective indicators, according to Eqs. ( 26) and ( 27), in this case, we take α = 1 and calculate the nor- malized comprehensive correction values for the three categories of roles as LR (1) = 0.385 , LR (2) = 0.295 , and LR (3) = 0.320 .Furthermore, we can calculate the modified revenue allocation values for the three categories of roles as φ1 (v) = 64.960 ,φ2 (v) = 8.320 , and φ3 (v) = 22.720.

Revenue allocation among participants of the same role
Revenue allocation among participants of the original data collectors According to Eq. ( 4), the modified revenue allocation from the original data collectors to Enterprise 1 and Enterprise 2 is calculated as ϕ 1 1 (v) = 29.48 and ϕ 1 2 (v) = 35.48.Using the entropy weight method to calculate the weights of the primary indicators that influence the revenue allocation for the participants under the original data collectors, similar to Sect.4.1, the weights assigned by experts are shown in Table 16.
The information entropy values and entropy weights of the primary indicators for the original data collectors are calculated, resulting in indicator weights of ω 3 = 0.238 , ω 4 = 0.337 , and ω 5 = 0.425 , as shown in Table 17.
The weights of the secondary indicators for the original data collectors are determined, and the classification results of the expert scores are shown in Table 18.
The weights of the secondary indicators that influence the participants under the original data collectors are calculated according to Eqs. ( 13)-( 16), and the specific process is displayed in Table 19.It is worth noting that the elements within the sets in Table 19 correspond to the indices of the scoring experts in Table 18.Calculate the weighting of primary indicators among participants under the data processors, with individual expert rating weights as shown in Table 21.According to Eqs. ( 10) and ( 11), the entropy weights for the primary indicators of the data processing party are calculated as ω 6 = 0.630 and ω 7 = 0.370.
Determine the weights of secondary indicators for the data processors, with the classification results of expert ratings shown in Table 22.The calculation process for the secondary indicator weights is presented in Table 23.It is worth noting that the elements within the sets in Table 23 correspond to the indices of the scoring experts in Table 22.
Taking into account the entropy weights of the primary indicators, the final weights for each indicator under data cleansing and data analysis are obtained as ωA 6 = [0.296,0.334] and ωA 7 = [0.148,0.222].The data analysis situation of the participating enterprises is evaluated using fuzzy evaluation, as shown in Table 24.
By synthesizing both subjective and objective evaluation values and normalizing them, we obtain L2 (1) = 0.296 , L2 (2) = 0.365 , L2 (3) = 0.339 .Consequently, the modified revenue allocation values for Enter- prise 2, Enterprise 3, and Enterprise 4 under the data processors are φ2 1 (v) = 1.564 , φ2 2 (v) = 3.936 , φ2 3 (v) = 2.820.www.nature.com/scientificreports/evaluation index system is established, considering the characteristics of different roles.At the first layer, the model allocates revenues among the three roles and introduces risk indicators for adjustment purposes.At the second layer, the model redistributes the adjusted revenues to participating companies under each role, while designing evaluation indicators specific to each role to modify the initial revenue allocation for each company.
Finally, the profits of participating companies under each role are synthesized to obtain the final profit allocation for each company.This two-layer approach, combining the Shapley value with modifications, achieves a fair and effective distribution of road data asset profits.
Case studies verify that the model effectively addresses the revenue allocation issues among multiple roles in the road data asset value chain, achieving fair and reasonable allocation results.The innovation of this model lies in the role categorization and two-layer revenue allocation mechanism, which fully considers the characteristics and contributions of different roles, as well as the differences among participating companies within the same role, thereby achieving a fair and reasonable profit allocation.Specifically, the evaluation index system can be flexibly adjusted according to the actual situation.This research provides new ideas and methods for the revenue allocation of road data assets, offering important references for promoting the utilization and circulation of road data assets.

Figure 1 .
Figure 1.Evaluation indicator system for road data asset revenue allocation.

Figure 2 .
Figure 2. Architecture of the improved revenue allocation model for road data assets.
Original data collectorsThey obtain revenues by collecting road original data, which are primarily generated from enterprises' activities in road construction and operation management.For example, traffic volume and speed data collected by road authorities through fixed monitoring devices; toll station traffic volume and toll data acquired by toll road operators; real-time traffic conditions and route data collected by map service companies using navigation devices; vehicle status and road condition data gathered by automakers through onboard devices.

Table 2 .
Description of evaluation indicators for road data asset revenue allocation.

Table 5 .
Indicator values for participating enterprises under the original data collectors.

Table 6 .
Indicator values for participating enterprises under the data processors.

Table 7 .
Indicator values for participating enterprises under the data product producers.

Table 8 .
Initial revenue allocation for the three roles.

Table 9 .
Scoring results of the role's primary indicators. AExternal

Table 10 .
Scoring weights for the role's primary indicators. AExternal

Table 11 .
Process of calculating entropy weights for the role's primary indicators.

Table 12 .
Scoring results of the roles secondary indicators.
AExternal risks A

Table 13 .
Classification results of secondary indicator scores for roles. AExternal

Table 14 .
Process of calculating weights for the role's secondary indicators.

Table 15 .
The risk indicator evaluation process for each role.

Table 16 .
The weighting of primary indicator scores for the original data collectors.
AConstruction cost A

Table 17 .
Process of calculating entropy weights for the primary indicators of the original data collectors.
jConstruction cost A

Table 18 .
Classification results of secondary indicator scores for the original data collectors.
AConstruction cost A

Table 19 .
Process of calculating weights for the secondary indicators of the original data collectors.Revenue allocation among participants of the data processorsUsing the traditional Shapley value method, the adjusted profit distribution for the data processing party is allocated to Enterprise 2, Enterprise 3, and Enterprise 4, resulting in ϕ 2 1 (v) = 1.874 , ϕ 2 2 (v) = 3.673 , and ϕ 2 3 (v) = 2.773 .

Table 20 .
The evaluation process of data demand indicators for enterprises participating under the original data collectors.

Table 21 .
The weighting of primary indicator scores for the data processors.
AData cleansing A

Table 22 .
Classification results of secondary indicator scores for the data processors.
AData cleansing A

Table 24 .
The evaluation process of data analysis indicators for enterprises participating under the data processors.

Table 27 .
The final revenue allocation scheme for participating enterprises (unit: ten thousand RMB).